An Efficient Compiler for Weighted Rewrite Rules
نویسندگان
چکیده
Context-dependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as finite-state transducers (FSTs). We describe a new algorithm for compiling rewrite rules into FSTs. We show the algorithm to be simpler and more efficient than existing algorithms. Further, many of our applications demand the ability to compile weighted rules into weighted FSTs, transducers generalized by providing transitions with weights. We have extended the algorithm to allow for this. 1. M o t i v a t i o n Rewrite rules are used in many areas of natural language and speech processing, including syntax, morphology, and phonology 1. In interesting applications, the number of rules can be very large. It is then crucial to give a representation of these rules that leads to efficient programs. Finite-state transducers provide just such a compact representation (Mohri, 1994). They are used in various areas of natural language and speech processing because their increased computational power enables one to build very large machines to model interestingly complex linguistic phenomena. They also allow algebraic operations such as union, composition, and projection which are very useful in practice (Berstel, 1979; Eilenberg, 1974 1976). And, as originally shown by Johnson (1972), rewrite rules can be modeled as 1 Parallel rewrite rules also have interesting applications in biology. In addition to their formal language theory interest, systems such as those of Aristid Lindenmayer provide rich mathematical models for biological development (Rozenberg and Sa]omaa, 1980). 231 finite-state transducers, under the condition that no rule be allowed to apply any more than a finite number of times to its own output. Kaplan and Kay (1994), or equivalently Karttunen (1995), provide an algorithm for compiling rewrite rules into finite-state transducers, under the condition that they do not rewrite their noncontextual part 2. We here present a new algorithm for compiling such rewrite rules which is both simpler to understand and implement, and computationally more efficient. Clarity is important since, as pointed out by Kaplan and Kay (1994), the representation of rewrite rules by finite-state transducers involves many subtleties. Time and space efficiency of the compilation are also crucial. Using naive algorithms can be very time consuming and lead to very large machines (Liberman, 1994). In some applications such as those related to speech processing, one needs to use weighted rewrite rules, namely rewrite rules to which weights are associated. These weights are then used at the final stage of applications to output the most probable analysis. Weighted rewrite rules can be compiled into weighted finite-state transducers, namely transducers generalized by providing transitions with a weighted output, under the same context condition. These transducers are very useful in speech processing (Pereira et al., 1994). We briefly describe how we have augmented our algorithm to handle the compilation of weighted rules into weighted finite-state transducers. In order to set the stage for our own contribution, we start by reviewing salient aspects of the Kaplan and Kay algorithm. 2The genera] question of the decidability of the halting problem even for one-rule semi-Thue systems is still open. Robert McNaughton (1994) has recently made a positive conjecture about the class of the rules without self overlap.
منابع مشابه
An E cient Compiler for Weighted Rewrite Rules
Context-dependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as nite-state transducers (FSTs). We describe a new algorithm for compiling rewrite rules into FSTs. We show the algorithm to be simpler and more eecient than existing algorithms. ...
متن کاملRe-Engineering Letter-to-Sound Rules
Using finite-state automata for the text analysis component in a text-to-speech system is problematic in several respects: the rewrite rules from which the automata are compiled are difficult to write and maintain, and the resulting automata can become very large and therefore inefficient. Converting the knowledge represented explicitly in rewrite rules into a more efficient format is difficult...
متن کاملLightweight Higher-Order Rewriting in Haskell
We present a generic Haskell library for expressing rewrite rules with a safe treatment of variables and binders. Both sides of the rules are written as typed EDSL expressions, which leads to syntactically appealing rules and hides the underlying term representation. Matching is defined as an instance of Miller’s pattern unification, which makes for efficient execution when rules are applied in...
متن کاملAxioms as generic rewrite rules in C++ with concepts
Compilers are typically hardwired to attempt many optimizations only on expressions that involve particular built-in types. Ideally, however, an optimizing compiler would recognize a rewrite opportunity for user-defined types as well, whenever the operands of an expression satisfy the algebraic properties that justify the rewrite. This paper applies the principles and techniques of generic prog...
متن کاملA Bimachine Compiler for Ranked Tagging Rules
This paper describes a novel method of compiling ranked tagging rules into a deterministic nite-state device called a bimachine. The rules are formulated in the framework of regular rewrite operations and allow unrestricted regular expressions in both left and right rule contexts. The compiler is illustrated by an application within a speech synthesis system.
متن کامل